Comparing text-driven and speech-driven visual speech synthesisers
نویسندگان
چکیده
We present a comparison of a text-driven and a speech driven visual speech synthesiser. Both are trained using the same data and both use the same Active Appearance Model (AAM) to encode and re-synthesise visual speech. Objective quality, measured using correlation, suggests the performance of both approaches is close, but subjective opinion ranks the text-driven approach significantly higher.
منابع مشابه
On evaluating synthesised visual speech
This paper describes issues relating to the subjective evaluation of synthesised visual speech. Two approaches to synthesis are compared: a text-driven synthesiser and a speech-driven synthesiser. Both synthesisers are trained using the same data and both use the same model for rendering the synthesised visual speech. Naturalness is used as a performance metric, and the naturalness of real visu...
متن کاملShort-term and Long-term Impact of Video-driven Metapragmatic Awareness Raising on Speech Act Production: A Case of Iranian Interme-diate EFL Learners
متن کامل
Do Text-to-Speech Synthesisers Pronounce Correctly? A Preliminary Study
This paper evaluates 4 commercial text-to-speech synthesisers used by dyslexic people to listen to and proof read text. Two evaluators listened to 704 common English words and determined whether the words were correctly pronounced or not. Where the evaluators agree on incorrect pronunciation, the proportion of correct pronunciations for the four synthesisers is in the range 98.9% to 99.6% of th...
متن کاملPragmatic comprehension of apology, request and refusal: An investigation on the effect of consciousness-raising video-driven prompts
Recent research in interlanguage pragmatics (ILP) has substantiated that some aspects of pragmatics are amenable to instruction in the second or foreign language classroom. However, there are still controversies over the most conducive teaching approaches and the required materials. Therefore, this study aims to investigate the relative effectiveness of conscio...
متن کاملText to Avatar in Multi-modal Human Computer Interface
In this paper, we present a new text-driven avatar system, which consists of three major components, a text-to-speech (TTS) unit, a speech driven facial animation (SDFA) unit and a text-to-sign language (TTSL) unit. A new visual prosody time control model and an integrated learning framework are proposed to realize synchronization among speech synthesis, face animation and gesture animation, wh...
متن کامل